home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
icon
/
newsgrp
/
group98a.txt
/
000130_icon-group-sender _Fri Mar 13 12:35:12 1998.msg
< prev
next >
Wrap
Internet Message Format
|
2000-09-20
|
4KB
Return-Path: <icon-group-sender>
Received: from kingfisher.CS.Arizona.EDU (kingfisher.CS.Arizona.EDU [192.12.69.239])
by baskerville.CS.Arizona.EDU (8.8.7/8.8.7) with SMTP id MAA14399
for <icon-group-addresses@baskerville.CS.Arizona.EDU>; Fri, 13 Mar 1998 12:35:12 -0700 (MST)
Received: by kingfisher.CS.Arizona.EDU (5.65v4.0/1.1.8.2/08Nov94-0446PM)
id AA17700; Fri, 13 Mar 1998 12:35:12 -0700
From: gep2@computek.net
Date: Fri, 13 Mar 1998 11:30:44 -0600
Message-Id: <199803131730.LAA18482@axp.cmpu.net>
Mime-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Subject: Letter Probabilities
To: icon-group@optima.CS.Arizona.EDU
X-Mailer: SPRY Mail Version: 04.00.06.17
Errors-To: icon-group-errors@optima.CS.Arizona.EDU
Status: RO
Content-Length: 2730
> I have a table that associates inidividual letters (one-char strings)
with real numbers (probabilities). We can assume for the sake of
argument that the sum of all probabilities in my table is unity.
> Given this table (which I already have Icon code to obtain) what is the
most efficient method of generating random text? What I am thinking of
at the moment is:
> (1) get a sorted list of [key,value] pairs,
sorted by value (probability),
highest probability first
> (2) generate a random number from 0.0 to 1.0
> (3) use a while-loop to find the slot in the
sorted list where the number falls;
I would subtract each passing probability
until my placeholder value had vanished; e.g.
> i := 0 # running index
x := ?0 # random number 0.0 - 1.0
while x > 0 do
{
i +:= 1
x -:= prob_list[i][2]
}
letter := prob_list[i][1]
You're trying to program it like you were programming in C, and that C-style
"clockwork mentality" is why you're having problems, IMHO.
What I think you ought to do is to simply take your table and build a string
(just once!) which contains a number of each letter commensurate with its
probability. Then you can replace all this silly C-style nonsense with just:
?letterstring
...and they will "automagically" come out (as many as you need) with the
probability you require.
In fact, you don't even need to start with your "probabilities" table at all...
you can just take your demonstration text, put it all in a string and remove any
characters you don't want there (quotation marks and other punctuation for
example) and you're all set.
> This all seems rather awkward to me, especially step (3).
Yup, exactly, that's because you're using C-type programming mentality instead
of embracing an Icon-native approach to the problem.
> Isn't there some construct in Icon that could do this more elegantly?
You BETCHA there is. See above. :-)
> P.S. I am already very well acquainted with the sample program
'monkeys.icn' in the distribution. This program uses multiple-character
sequences, not individual letter probabilities. More characters gives a
better approximation to the source language, but I am interested
specifically in single-character probabilities right at the moment.
You also ought to buy a copy of the fascinating book "Algorithms in SNOBOL4" by
Gimpel (and sold in facsimile reprint by Catspaw). There are some exceedingly
nicely done programs there (and very, very useful functions) which deal with
random text generation too.
Gordon Peterson
http://www.computek.net/public/gep2/
Support the Anti-SPAM Amendment! Join at http://www.cauce.org/